Agricultural non-point source pollution and health of the elderly in rural China

Large input and high loss of chemical fertilizer are the major causes of agricultural non-point source pollution in China. Employing fertilizer loss and micro-health data, this paper analyzes the effects of chemical fertilizer loss on the health of rural elderly and the medical cost in China. Results of the difference-in-differences (DID) method indicate that one kg/ha increase in fertilizer loss alters a key medical disability index (Activities of Daily Living) by 0.0147 (0.2 percent changes) and the number of diseases by 0.0057 for rural residents of 65 and older. This is equivalent to CNY 316 million (USD 45 million) at national medical cost. Furthermore, the age of onset is younger in regions with higher fertilizer loss. One kg/ha increase of fertilizer loss advances the age of onset by 0.267 year, which will cause long-term effect on public health. Our results are robust to a variety of robustness checks.

Thank you for coordinating, reviewing, and extending the invitation to revise our manuscript. We have treated yours as well as the reviewers' comments seriously and made significant modification to the manuscript. As follows, we will summarize revisions implemented following the two reviewers' comments before point by point responses.
For Reviewer1, we have: 1.Improved writing and readability by proving a better expression and revising Figure 1 and For Reviewer2, we have: 1.Revised the section of "parallel trend analysis" and explained the regression results in more details. 2.Added the regression results of propensity score matching (PSM) difference-indifferences (DID). 3.Offered the binary robustness check by changing the cutoff for the binary treatment effect.
We hope you find our revised manuscript acceptable for publication in PLOS ONE. Sincerely,

Ying Wang
Cc. Hang Xiong and Chao Chen Reviewer: 1 Thank you for the constructive comments on our manuscript. We have taken your comments seriously and substantially modified the manuscript. We believe our work has been significantly improved thanks to your comments. Please find below our response to each point you made with your original comments in italic.
The manuscript is much improved. I have the following comments before this paper can be published. I have also highlighted some minor comments such as grammar mistakes, typos, and formatting errors in the revised manuscript (see pdf if applicable). 1.Abstract: "ADL daily life index of 0.0147.." How much is this change in percentage? Response：Thank you very much for your question. ADL daily index of 0.0147 is 0.2 percent change. We have added the percentages in the abstract. The empirical model results are in Table1. The explained variables are the logarithmic of health outcome variables. Table 1 Empirical result of the logarithmic of health outcome variables. ADLNo. of diseases (1)(2)(3)(4) Fertilizerloss*Time0.0020***0.0017**0.00240.0047* (0.0006)(0.0007)(0.0030)(0.0028) ControlNOYESNOYES Year FEYESYESYESYES ID FEYESYESYESYES Observations321893216544264395 R-squared0.56180.57190.52450.6093 Notes: Robust standard errors for clustering at the individual level are reported in brackets. *, **, *** are significant at the level of 10%, 5% and 1% respectively.
2. Figure 1: Can you add something about the pathways, such as through drinking water? Response：Following your advice, we have revised the Figure 1 4. Figure 2: Could you add a vertical line to show treatment time? Response：Following your advice, we have added a vertical line to show treatment line and we have replaced the figure2. 5. Table 3 Missing variable name for coefficients. Please add. Also, could you check the R2 for column (3)? Response：Sorry about these errors. In our revised manuscript, we have carefully corrects these errors. 6.What are the model specifications for TWFE? Response：Thanks for your question. We use two-way fixed effect model to estimate the health effect of fertilizer loss directly. We fix the individual and time effects both.
(1) where demotes the health outcome; identifies the marginal effect of fertilizer loss; is a vector of control variables including individual, household, and provincial characteristics such as age, gender, health behavior, dietary habits, industrial pollution and hospitals, etc.; is a fixed effect unique to individuals ; is a time effect common to all individuals in year ; and is an error term. 7. Table 4 Do individual fixed effects mean the individual person (survey participants) fixed effects or province fixed effects?
Response：In this paper, individual fixed effects mean the individual person (survey participants) fixed effects. And we have revised by "ID FE" to make the expression clearer. 8.Section 3.2.1 "The differences between younger and middle olds are not significant in both ADL and No. of diseases". Do you expect that the effects are homogenous across ages? Are there any possible explanations if they are the same? Response：Thank you for the questions. The health effects of pollution expose are complex and does not follow a simple linear or near-linear relationship. The elderly often has a higher susceptibility and a higher mortality rate (Wong et al., 2015;Cohen and Gerber, 2017). The heterogeneous effects are because of the age-related behaviors and the physical functions (Tuttle et al., 2013;Rockwood et al., 1999). Increasing variability in diseases and disabilities could be found when people aged and the aging people were more likely to fall and be hospitalized (Speechley and Tinetti, 1991;Winograd, 1991). The younger elderly had better physical functions, so effects on ADL and No. of diseases are not significant. However, the younger elderly also has health risks. The elderly becomes suffered from the diseases related to fertilizer loss as the age growing. So a significant effect on No. of diseases could be observed among the middle olds elderly.

9.Table 5 what are the p values?
Response：P-values estimate the likelihood of obtaining the observed differences in coefficient estimates. The original hypothesis is there is no difference between groups ( ).The model results reject the original hypothesis. It means that there is a significant difference between groups.
10.Section 3.3.1 Parallel trend test Why are they divided into five periods? Response：Thank you very much for your question. There are six waves (2000,2002,2005,2008,2011,2014) survey data. Two periods before the agricultural support policies, and four periods after treatment time. We choose the first period before the treatment time point as the benchmark group, so there are five periods. We have restructured this part.

11.Table S3 What do that superscripts for p values mean?
Response：P-values estimate the likelihood of obtaining the observed differences in coefficient estimates.
Response for comments in the PDF documents: 1.Thank you very much for the suggestion about typographical and grammatical errors. We have corrected the errors. And we have checked publications in the journal and reconfirmed that the supporting information citations as "S1 Fig" and "S1 Table". 2.We have revised the section of conclusions and discussion and shortened the expression of Line344-347. 3.For response 9: We have added the details about the average values of fertilizer loss in note of Table 2 in our revised manuscript. 4.For response 17: There is a double-direction causality relationship between medical supply and health. And the medical supply is only a control variable, so we did not discuss the double-direction causality relationship. Further, we estimated health effect of fertilizer loss by deleting the medial supply variable (the results in Table 2). Also the negative effects between medical supply and health were found in other literatures (Lai, 2017;Ju,2022), when they controlled medical supply. Table 2 Estimates without variable of medical supply ADLNo. of diseases Fertilizerloss*Time0.0173***0.0051*** (0.0061)(0.0014) ControlYESYES Year FEYESYES ID FEYESYES Observations3216532329 R-squared0.55640.6903 Notes: *, **, *** are significant at the level of 10%, 5% and 1% respectively. Reviewer: 2 6. Your parallel trend analysis is appreciated. However, you might want to add an explanation on which figures are for low fertilizer regions and which are for the high fertilizer regions. Otherwise, I cannot understand. Also, the explanation "We divide the samples to five sections" is also confusing. You could consider rewriting this part. Response：Thank you very much for your suggestion. In order to test whether the time staggered entry events of agricultural policy are effective exogenous shocks, we conduct a parallel test base on Equation (9): (9) where is a binary variable that takes value of 1 when agricultural support policies are implemented. We use to denote the base year. Panel A and Panel B of S2 Fig were the estimate results of fertilizer loss and Panel C and Panel showed the estimated coefficients when adopting a binary variable that takes values 1 as high loss areas. Following your advice, we have restructured the Section of Parallel trend test. We added the Equation (9) to explain how we do the parallel trend, and we explained S2 Fig in more detail. 7. I am concerned about this treatment. Although in some cases, adding linear coefficients could work, health factors are sometimes not linear. Do the authors have considered matching? If so, how does it go? Response：Following your advice, three matching techniques are utilized to assess the health effect of fertilizer loss, namely nearest neighbor matching, the radius matching and the kernel matching. The matching results are reported in Table 3. The estimated coefficients are significant and it means that the basic results are robust. We have added the Table 3 in support information as S2 Table in our revised manuscript.  Table 3 Estimates of Matching Nearest neighborRadiusKernel ADLNo. of diseases ADLNo. of diseases ADLNo. of diseases Fertilizerloss*Time0.0695**0.0143**0.0267***0.0065***0.0267***0.0065*** (0.0273)(0.0064)(0.0085)(0.0023)(0.0085)(0.0023) ControlYESYESYESYESYESYES Year FEYESYESYESYESYESYES ID FEYESYESYESYESYESYES Observations4532453217801178011780117801 R-squared0.59270.71520.58280.70490.58280.7049 a *, **, *** are significant at the level of 10%, 5% and 1% respectively. 8. I know it makes our life difficult if no package is available. However, assuming the level of fertilizer loss does not have different impacts is too strong. After all, this is the main argument of this study. I would suggest, that given the authors already adopt binary treatment in the robustness check, the authors should repeat the binary robustness check by calibrating the cutoff for the binary treatment effect. Response：Following your advice, we have offered the binary robustness check by changing the cutoff for the binary treatment effect. We defined the high loss areas with the top 10%, 20%, 30% and 40% of the highest fertilizer loss intensity respectively. Table 4 reports the estimate coefficients in Equation (5), the basic regression results of binary treatment are robust to a variety of robustness checks.

37
China experienced rapid growth in chemical fertilizer use, particular after 2004 when the Chinese 38 government shifted from taxing agriculture to subsidizing agricultural programs. The unit area chemical 39 fertilizer usage is nowadays 4 times more than the world average [1]. Agricultural nonpoint-source pollution 40 is a growing concern in China because these pollutions had become the main cause of water pollution [2].

42
Specifically, the elderly might have a higher health risk comparing with the young people when facing 43 environmental exposure. A higher susceptibility and higher mortality rates could be observed among the adults 44 over the age of 75 years compared with the younger ages [7][8][9]

91
The rest of the paper is organized as follows. Section 2 provides data description and summary statistics.

92
Section 3 presents empirical models as well as results and Section 4 concludes.  The individual health data come from the Chinese Longitudinal Healthy Longevity Survey (CLHLS).

98
CLHLS is chosen for this research due to its richness in health and demographic information, its nationwide sampling areas, and its time span which covers the year of the agricultural policy change. The CLHLS was 100 conducted in a randomly selected half of the counties and cities in 22 provinces, covering 85% of the total 101 population in China. Its eight waves (1998,2000,2002,2005,2008,2011,2014    The ADL index is widely used to measure the dependence of the elderly and physically inactive people.

117
It is also known as the "disability" index [34]. The ADL index is constructed by asking participants whether 118 they need help with six basic activities (namely bathing, dressing, eating, going to the toilet, indoor movement, 119 and controlling defecation). The choices for each item are "can do without help" (corresponding score is 1), 120 "need help" (corresponding score is 2), and "need full help" (corresponding score is 3). The final score of the 121 ADL index is the sum of all six items, ranging from 6 to 18. Higher scores mean that the health status of the 122 elderly is worse.
The number of diseases measures how many types of diseases related to fertilizer loss are suffered by a 124 sample individual. Although the CLHLS database investigated 16 diseases in the elderly, including 125 hypertension, heart disease, cancer, stroke, cardiovascular and cerebrovascular diseases, bronchitis, 126 emphysema, asthma, and pneumonia, only three diseases-hypertension, cancer, and gastrointestinal ulcer-127 were related to fertilizer loss in the existing medical literature.

128
The age of onset refers to the age at which a sample individual first suffered from any of the three diseases 129 (i.e., hypertension, gastrointestinal ulcer, or cancer). which respondents eat meat, fish, egg, and salt-preserved vegetables. Options for each item are "rarely or 136 never", "not every month, but occasionally", "not every week, but at least once per month", "not every day, 137 but at least once per week", "almost every day", with scores from 1 to 5, respectively. The questionnaire also 138 asks the sources of respondents' drinking water. Options include drinking water sources from a well, a river 139 of lake, a spring, a pond or pool, and tap water. A binary variable (Water) is constructed, which equals one if 140 respondents drink surface water, i.e. water from a rive or lake or a pond of pool, and equals zero otherwise.   The statistical descriptions of all variables used in the regression are summarized in Table 1. other hand, the programs might lead to the distortion of agricultural factors and cause negative effects on soil and water environment [40]. Table 2 reports the differences of health outcomes and fertilizer loss between 180 regions with high versus low intensity of fertilizer loss. We report the differences of ADL index, the numbers 181 of induces diseases, intensity of fertilizer input, amount of fertilizer loss and intensity of fertilizer loss before 182 the implementation of agricultural support policies in column (3). The differences after the agricultural support 183 policies are in column (6). The widening gap in health outcomes and fertilizer loss and input can be observed 184 in column (7) between regions with high and low fertilizer loss after the agricultural support policies. Prelim-185 inary data analysis shows that the loss of chemical fertilizer in high-loss areas was more serious after the 186 agricultural subsidy policies.

Table 2. Comparison of high-loss-areas and low-loss areas by agricultural support policies.
188

Variables
Before agricultural Support Policies After Agricultural Support Policies 2nd difference Low loss (4) High loss (5) 1st difference (6)= (5) The fertilizer loss is continuous variable, and we dichotomized fertilizer loss by its pre-policy mean to compare the health effect in 190 different regions. The average value of fertilizer loss is the national average of fertilizer loss; it is equal to 5.36kg/ha. 191 We showed the fertilizer input and fertilizer loss in S1 Fig.   192  193 194 The CLHLS contains three diseases (namely hypertension, gastrointestinal ulcer and cancer) induced by 195 fertilizer loss. The age, when one sample first suffers from any of the three diseases, is defined as age of onset.  is specified as: The relationship between medical cost and health can be identified in the following Equation:

Age of onset and fertilizer loss
224 where it Y represents medical costs, and other variables are defined as before.  Table 3 reports the 1  and 1  coefficients in Equation (4) and Equation (5)  (4) one kg/ha increase of fertilizer loss is associated with 0.0057 raise in the number of diseases (0.5 percent).

232
These results indicate that the fertilizer loss increase the elderly's health risks and support diseases 233 significantly. For different regions, estimates for 1  in columns (2) and column (5) of Table 3 Table 3. Basic estimates ADL index and diseases.

250
The estimated results without control variables were in S1 Table. 251 Considering the nonlinear health factors, we further estimated the health effects of fertilizier loss using propensity score matching 252 (PSM) difference-in-differences (DID) model. The results were in S2  can be observed in the oldest olds.

259
The estimates of 1  are positive and significantly for both age groups (columns (3) and columns (4) of S3 260   Table). The fertilizer loss will increase the risk of diseases in both the younger olds and the oldest olds. The

Effect of fertilizer loss on age of onset 267
This study further explores whether fertilizer loss will lead to the elderly be sick in younger age. Though

277
In this section, we continue explore these results by translating estimated effects into expected monetary 278 losses. We do this by exploiting the relation between ADL and medical cost in Equation (8). Estimated 279 coefficient in column (1) of Table 5 indicates that each additional unit increase in ADL index increases the 280 medical cost by CNY 244.6 (USD 34.94). Columns (2) and (3) where , it k D is a binary variable that takes value of 1 when agricultural support policies are implemented.

Conclusions and discussion
China's agricultural fertilizer input is facing a problem of "high input, high loss, and high pollution", 340 which has been aggravated by agricultural support policies since 2004. The problem of the "Three Highs" of 341 chemical fertilizer use has caused pollution to the environment and health damages to rural residents. In this 342 paper, the chemical fertilizer loss data and CLHLS micro-health data are used to verify the health effect of           The X-axis represents the estimated coefficients from 1000 random assignments.

502
The curve is the estimated kernel density distribution. The vertical type is a true estimate in columns (2) and 503 (4) of Table 3.   The X-axis represents the estimated coefficients from 1000 random assignments. The curve is the estimated kernel density distribution. The vertical type is a true estimate in columns (2) and (4) of Table 3. fertilizer usage is nowadays 4 times more than the world average [1]. Agricultural nonpoint-source pollution 40 is a growing concern in China because these pollutions had become the main cause of water pollution [2].

41
Agricultural nonpoint-source pollution has also been found to be an important factor affecting health [3][4][5][6]. The ADL index is widely used to measure the dependence of the elderly and physically inactive people.

128
It is also known as the "disability" index [34]. The ADL index is constructed by asking participants whether 129 they need help with six basic activities (namely bathing, dressing, eating, going to the toilet, indoor movement, 130 and controlling defecation). The choices for each item are "can do without help" (corresponding score is 1),

131
"need help" (corresponding score is 2), and "need full help" (corresponding score is 3 were related to fertilizer loss in the existing medical literature.

139
The age of onset refers to the age at which a sample individual first suffered from any of the three diseases 140 (i.e., hypertension, gastrointestinal ulcer, or cancer). which respondents eat meat, fish, egg, and salt-preserved vegetables. Options for each item are "rarely or 147 never", "not every month, but occasionally", "not every week, but at least once per month", "not every day, 148 but at least once per week", "almost every day", with scores from 1 to 5, respectively. The questionnaire also asks the sources of respondents' drinking water. Options include drinking water sources from a well, a river 150 of lake, a spring, a pond or pool, and tap water. A binary variable (Water) is constructed, which equals one if 151 respondents drink surface water, i.e. water from a rive or lake or a pond of pool, and equals zero otherwise. The statistical descriptions of all variables used in the regression are summarized in Table 1.   The fertilizer loss is continuous variable, and we dichotomized fertilizer loss by its pre-policy mean to compare the health effect in 202 different regions. The average value of fertilizer loss is the national average of fertilizer loss; it is equal to 5.36kg/ha. 203 We showed the fertilizer input and fertilizer loss in S1 Fig.   204  205  The CLHLS contains three diseases (namely hypertension, gastrointestinal ulcer and cancer) induced by 207 fertilizer loss. The age, when one sample first suffers from any of the three diseases, is defined as age of onset.

208
The age of onset comparisons across age groups are represented by effects by splitting the sample in various age groups and reporting estimates of 1  and 1  for these it  is an error term.

236
The relationship between medical cost and health can be identified in the following Equation: 237 where it Y represents medical costs, and other variables are as defined as before.  (4) one kg/ha increase of fertilizer loss is associated with 0.0057 raise in the number of diseases (0.5 percent).

245
These results indicate that the fertilizer loss increase the elderly's health risks and support diseases 246 significantly. For different regions, estimates for 1  in columns (2) and column (5) of Table 3 indicate a 247 reduction in the ADL index and the number of diseases for high-loss area verse low-loss areas. Compared 248 with low-loss areas, the ADL index of the rural elderly is 0.1222 higher in areas with high chemical fertilizer loss, while the number of diseases is 0.0487 higher. We also use the two-way fixed effects model to estimate 250 the health effect of fertilizer loss. Column (3) and Column (6) of Table 3     The estimated results without control variables were in S1 Table. 265 Considering the nonlinear health factors, we further estimated the health effects of fertilizier loss using propensity score matching 266 (PSM) difference-in-differences (DID) model. The results were in S2 In Columns (1) and columns (2) of S1 S3 (4) of S1 S3 Table)., Tthe fertilizer loss will increase the risk of diseases in both the younger olds and the oldest olds is larger than that for younger and middle olds, and the differences are significant.  Table 4 reports the 1  and 1  coefficients in Equation (6) and Equation (7) 293 In this section, we further continue explore these results by translating estimated effects into expected 294 monetary losses. We do this by exploiting the relation between ADL and medical cost in Equation (8). 295 Estimated coefficient in column (1) of Table 5 indicates that each additional unit increase in ADL index 296 increases the medical cost by CNY 244.6 (USD 34.94). Columns (2) and (3)

Results of health conditions on medical cost
where , it k D is a binary variable that takes value of 1 when agricultural support policies are implemented.   sampling ensures that the model constructed in this paper has no effect on the health level of the elderly. In 341 this study, random sampling is carried out 1000 times, and the benchmark regression is performed according  360 In order to verify the robustness of the conclusions analysis of this study, different samples are used to 361 test the robustness. This paper focuses on health effect of agricultural fertilizer loss of the rural elderly. 362 Therefore, samples from major agricultural-producing production areas, major rice-producing areas and water resource-rich areas are used to carry out the basic regression of Equation (4)

545
The X-axis represents the estimated coefficients from 1000 random assignments.

Response to Editor
Dear Dr. Qiu, Thank you for coordinating, reviewing, and extending the invitation to revise our manuscript. We have treated yours as well as the reviewers' comments seriously and made significant modification to the manuscript. As follows, we will summarize revisions implemented following the two reviewers' comments before point by point responses.
For Reviewer1, we have: 1. Improved writing and readability by proving a better expression and revising Figure 1 and For Reviewer2, we have: 1. Revised the section of "parallel trend analysis" and explained the regression results in more details. 2. Added the regression results of propensity score matching (PSM) difference-indifferences (DID). 3. Offered the binary robustness check by changing the cutoff for the binary treatment effect.
We hope you find our revised manuscript acceptable for publication in PLOS ONE. Sincerely,

Ying Wang
Cc. Hang Xiong and Chao Chen

Reviewer: 1
Thank you for the constructive comments on our manuscript. We have taken your comments seriously and substantially modified the manuscript. We believe our work has been significantly improved thanks to your comments. Please find below our response to each point you made with your original comments in italic.
The manuscript is much improved. I have the following comments before this paper can be published. I have also highlighted some minor comments such as grammar mistakes, typos, and formatting errors in the revised manuscript (see pdf if applicable).